Compiling, Assembling, and Linking on UNIX

part 2

Here we have a number of source files (A.c ... Z.c)
Each of these files is compiled and assembled, so now we have A.o ... Z.o
Every object file lists all the symbols that it defines and all the symbols that it references.
Typically an object file will reference some symbols that it does not define and vice versa.
The canonical example of this is when one file calls a function defined in another file.
This happens in java all the time. You should be used to it by now.
Suppose that P.o calls the function kissMyGrits(), which is defined in K.o
This will be a jal instruction in P.o
When the assembler is working on P.s it can see that kissMyGrits is not defined there.
It writes the instruction as "jal 0" and records in P.o that the address of kissMyGrits should replace the 0.
Why zero? Well, have you got a better suggestion?
The linker is the first tool in all of this that knows where kissMyGrits actually ends up in the executable image.
It starts by reading the symbol tables in all the object files and making a grand central symbol table from them.
When it needs the address of kissMyGrits, it will look for it in the grand central symbol table.
Then it replaces the 0 with the address from the grand central symbol table.
And it does this for all of the undefined symbols.
Almost always there are still some undefined symbols at the end of this process. A good example is printf.
libc.a is an archive (thus the ".a" on the end of the name) That contains a lot of object files and an index to locate symbols in them.
These object files make up a library of functions that are useful in C programs. (thus the "lib" and the "c" in the name)
While it is doing the relocating, the linker will look in this archive for the undefined symbols.
This will turn all the undefined symbols into defined symbols. Well, we hope.
It will also copy object files defining the undefined symbols from the archive and add them to the executable image.
Generally, when code is pulled from an archive like this, there will be some new undefined symbols, so the linker recursively goes looking for them.
If it can't find those, then there is something wrong with the archive.
If there are still undefined symbols at the end of this process, it will punt back to you, and you get to try again.
There are other libraries available (e.g. libmath.a), but you have to ask for them; nicely.
libc.a is the archive of last resort; once the linker has found all the symbols that it can in the object modules and archives that it has been given, it will look for the rest in libc.a; automatically; you don't have to ask it to.
This process we have been describing is called static linking; each executable image contains its own copy of the code that it uses from the archive.
This makes every executable image bigger. People used to worry about all the extra disk space this was taking up.
In dynamic linking there is only one copy of each bit of code from the archive, and all users share it.
Dynamic linking is what is almost always used now (where do you think the extension ".dll" comes from?).
Static linking is much simpler, so we start with it.